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1 . Introduction 

On mips and mops 
and megaflops, 
and binary 
capacity. 

This report is an attempt to outline a formal structure for the 
study of computer capacity. Several traditional measures will be discussed 
and some new measures will be introduced. Our goals for the use of measures 
of computer capacity include: 

1) Quantification of upper bounds on a given machine's 

raw theoretical speed for various kinds of computations. 

2) Comparisons between diverse computer systems for some 
set of computations. 

3) Evaluation of the actual performance of a given machine 
on some job mix compared with its theoretical capacity. 

4) Guidelines for improving a given system's cost/performance. 
Traditionally, people have often quoted computer speeds in mips 

(millions of instructions per second). But the execution of an "instruction" 
yields rather different effects on various machines. The range is from some 
simple indexing operation on a traditional machine to a vector inner product 
instruction on a modern pipeline processor. Thus, as computer organizations 
diverged from one another mops (millions of operations p_er second) became a 
more reasonable measure. But, in many numerical calculations, floating-point 
arithmetic operations are the raison d'etre for the computer and logical 
operations, shifts, etc., are "overhead". Thus, megaflops (millions of 
floating-point operations per second)may be the important measure. 

Quoting megaflops is of course quite irrelevant for most computations 
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performed in the real world every day. In many computations, e.g., data base 
management, file processing, simulation, etc., almost no floating-point 
arithmetic is performed. The primary memory speed and often input/output 
speeds are the most important to quote in evaluating or comparing machines. 
Our formulation will include consideration of the type of computation being 
performed in terms of ratios such as primary memory to processor bandwidth 
used by a computation. 

We will attempt to bring together in a uniform way measures of the 
speeds of various parts of a computer as well as memory size. The two main 
measures which concern us are speed (of processor, primary and secondary 
memory) and size of primary memory. By definition, speeds are given in 
units per second and bits/second is the simplest such measure. It is 
traditional to call the bit rate of a communication channel its capacity. 
Similarly, sizes of memories in bits may be thought of as capacities. Since 
we shall be discussing speeds and sizes together, it seems reasonable to refer 
to the whole notion as "computer capacity". 

In addition to the above machine characteristics, our model will 
include characteristics of the programs being executed. In particular, we are 
concerned with the fractions of a computation which use each of the three major 
parts of a system: processor, primary memory and secondary memory. Thus, our 
model could be used by independently measuring machine and program character- 
istics, and relating them through the capacity surfaces we derive. 

One difficult question is how to deal with the control unit. It has 
the potential to allow the several major parts of a computer to operate 
simultaneously and thereby increase capacity in a major way. We shall briefly 
discuss "serial" control whereby only one function can be performed at a time. 



3 



Our major attention will be given to computer systems in which the processor, 
primary and secondary memory all can operate simultaneously in an overlapped 
way. The models we discuss can be thought of as assuming a perfect "lookahead" 
control unit. Alternatively, any idleness due to the control unit may be 
considered to be lumped together with the processor. Degradations in system 
capacity due to variously constrained control units could be an interesting 
area for further study. In fact, the control unit could be treated as a fourth 
dimension in Figure 6 of Section 4. 

2. Capacity in Overlapped Machines 

In this section we define processor, memory and system capacity. These 
definitions are given in terms of machine parameters (our a's) and program 
parameters (our $'s). There is a good deal of symmetry in much of the following, 
and we illustrate this by displaying a number of equations. 

Let us consider a clocked machine with a processor, i.e., an 
arithmetic and logical unit, operating at maximum bandwidth (i.e., data rate) 
Bp bits/second. Let the primary memory bandwidth be B^ bits/second. We define 

B B 

a =/>0 and a = r 11 > 0 . 
pm B p mp B m 

For any given computation, the total available bandwidth of the 
processor or memory may not be used. Thus, we define bJJ <_ B to be the 

r r 

bandwidth of the processor which is actually used in some computation. Similarly, 
we define B^ _< B m as the used bandwidth of the memory for a given computation. 

Also, for any given computation we define 
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and 



B u +B u 
pm B u 



B u +B u 
8 = m P > 1 

m 



so it follows that 



B u 



and 



IS - 1 = 0 , 
p pm g u - 



v - 1 = dU° • 

m 



We may interpret l/8p m as the fraction of some computation in which the processor 

is engaged. Similarly, ~— = 1 - - — is the fraction of a given computation in 

mp pm 

which the memory is engaged. 

If we assume that each memory cycle and each processor operation 

require the same amount of time, then the above can be interpreted as follows. 

For a machine with a control unit which overlaps memory and processor 

operation, l/3p m is the processor fraction of the total number of instructions 

executed or the processor fraction of the total bandwidth used for some 
computation. For a machine with a control unit which allows no overlap of 
processor and memory operation, l/3p m is the processor fraction of the total 

number of instructions executed. Obviously, similar statements hold for V3 m p. 

Next we consider the notion of the capacity of the processor, the 
memory and the combination of the two. We shall define capacities in bits/ 
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second. Since we are interested in maximum possible data rates, we shall 
assume that either the memory or the processor bandwidth is saturated in any 
given computation we discuss. Thus, all our discussions of capacity will 
assume that for the type of computation under consideration ho faster data 
rate is possible on the machine we are considering. 
Let us define 

111 n 

Y - > 1 , 

m 

and 

Y = ^> 1 , 

P 

which we call the memory freedom and processor freedom , respectively. When 
Y m = 1 , the computation is said to be memory bound and when Yp = 1 > the 

computation is said to be processor bound . As outlined in the preceding 

paragraph, our subsequent discussions of capacity will assume that either 

Y m = 1 or y = 1 , or both, 
'm 1 p 

We can relate machine parameters (a's), program parameters (e's) and 
freedoms (y's) as follows. Since 

Y B B u B u 
T m = _m p_ = _p_ 

^p " B B u " B u 
K p m m 



and since 



B u 

mp_ = _p_ 

P m B m 

r m 
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we have 

y m _ g mp 

Yn a pm 3„ m * 
'p r pm 

Since cip m = l/o^ , we can derive a similar expression by interchanging m's and 

p's in this equation. 

Now we define, for any given computation on any given machine with 
overlapped processor and memory, the processor capacity 
r 



c p= < 



V P (1) 

B„ otherwise. 
^ P 



Note that a„ m > - 1 is equivalent to 
pm — w pm ^ 



so 



B B u 
m m 

B p L B U 
H P 



or 



B B 

jh > _p_ 

B U ~B U 
m p 



But since we are assuming that either Y m = 1 or Yp = 1 > this implies that 
y = 1. Thus, in the processor bound situation our definition sets C = B 

r r r 

which is the maximum processor data rate. 

On the other hand, if « m < 0„ m - 1, it follows that v„< y„ > but 

pm — pm ' m — ' p 

since y m = 1 or Y = 1 we conclude that y m = 1 » and we are memory bound. 
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Thus, 13 = b" Now the definition of C can be rewritten in this case as 
mm p 

a B B 
r = pm p _ m _ gU 

P " V 1 " B U /B U " P Yni * 
H m' p 

But since y m = 1 , we have Cp = B^ in the case of a memory bound computation. 

Thus, the processor capacity is defined to be the fraction of the processor 
bandwidth which can be used for this computation, given the fact that memory 
bandwidth is saturated. 

If we rewrite processor capacity as 

C n = — B n • 
P Y p P 

we can interpret it as Bp if memory freedom is greater than processor freedom 
for some computations and as Bp times the ratio of the freedoms otherwise. 
We emphasize that the processor only reaches' its maximum capacity B when 

^m — ■ "^d 



We can derive an expression for memory capacity C m with analogous 

characteristics to processor capacity. Thus, we write 
r n 

a B 

mp m ■ r n 

mp (2) 



m 



B„ otherwise. 
Since > B m = B p ' B mP ' 1 = BpTT and B m = a pm B p we can ex P ress C m in 



terms of Bp as follows 



C = ■ 
m 



(3 ~l)B n 
x pm p 

•pm B p 



if 3 Mm - 1 < 01 
pm — pm 

otherwise. 



(3) 



If we define system capacity C g to be the total system bandwidth 
available for any calculation, by properly adding Equations 1 and 3 we obtain 



r 



' J et, B n 

V 1 pm p 

(1+3 -l)B n 
pm ' p 



if a nm < 6„ m - 1 
pm — pm 



otherwise, 



so 



r 



a pm^pm B 

3-1 p 

pm r 



p pm B p 



if ct < 3„ m - 1 
pm — pm 



otherwise. 



(4) 



This can be expressed in terms of B m as 



-{ 



a m „3 m „ 
MP m P R 

3-1 m 
mp 

3 B 
p mp m 

v. 



if a m „ < 3 mr , - 1 
mp — mp 



otherwise. 



(5) 



Note that maximum system capacity occurs when both the memory and processor 

are bound, i.e., y« = = 1- Thus, from Equation 4, if a m = 3„ m - 1 we have 
'p 'm ^ pm pm 



C s = 



a 3 
P"» Pi" R 

3 -1 P 

p pm • v 

0 + — )B m 
pm 



+ , - -i )ct B^ 

V 1 pm p 

B n 

(l+n £ -)B m = B m + B n 

B ' m m p 

m r 
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Thus, the maximum system capacity is the sum of the maximum processor and 

memory bandwidths . 

To make matters concrete, we give in Figure 1 examples of capacities 

for ct = 2 and various values. In Figure 1 we denote activity by X and 
pm pm 3 

inactivity by 0. We show two columns under the label "memory" to denote that 

the memory bandwidth is twice the processor bandwidth, i.e., cip m = 2. The 

capacities are shown under the columns of activity. Overall results are 
plotted in Figure 2. 

In Figure 3 we plot system and processor capacity for various 
values of a . Note that the processor can perform at its maximum capacity 

over a wider range of problems (3p m values) for larger a . Note also that 

the memory capacity which is available for memory to memory (or I/O) operations 
becomes greater for larger a. It should be remarked that as 3p m approaches 1, 

reasonable system performance depends on a high frequency of register to 
register operations (or cache to cache operations). 
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pm 



3nm = 3/2 

pm 
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Figure 1 (continued). Overlapped Processor and Memory, a, 




Figure 2. Capacity for a = 2 




Figure 3. Capacities for Various a Values 
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3. Capacity in Non-Overlapped Machines 

To contrast the previous section with a simpler machine and 
demonstrate how capacities vary as a function of machine organization, we 
now disallow the simultaneous operation of memory and processor. However, 
we do assume a perfect lookahead control unit. Figure 4 illustrates the 
situation for a = 2. 

It may be seen that in the case of non-overlapped processor and 
memory, we have (using the notation of the previous section): 



a B 

P " a +6 -1 ' (6) 
H a pm p pm 

c _ > ( V 1)B P (7) 

and 

a 6 B 

C = P m P m P (8) 
S a +3 -1 [ } 

pm pm 

We plot the capacity of a non-overlapped machine for a pm = 2 in 

Figure 5. Note the contrast with Figure 2, an overlapped machine. Here the 
processor and memory capacities only reach their maximum bandwidth at the 
limits of l/3p m - Note also that a good deal less system capacity is left 

over for I/O activities. 

We can easily show that an overlapped machine's capacities are all 
greater than or equal to a non-overlapped machine's. Thus, from Equations 
1 and 6 we see that 
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non- 
overlapped 



a B 
pm p 



r 



ot „B 
P m P 



pm "pm ( "pm 
B 



v P J 



overlapped C 



since a > 0 in the first case, and & m > 1 in the second case. In similar 
pm pm — 

ways we can show that 



non-overlapped C m <_ overlapped C m 



and 



non-overlapped C g £ overlapped C g . 
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Figure 4. Non-overlapped Processor and Memory, a = 2 
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Figure 5. Non-overlapped Capacity for a = 2 
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4. Processor-Memory-Disk Systems 

Now we turn to a complete system with three components—processor 
and primary memory as above, together with a secondary memory which we shall 
refer to as a disk. We shall assume at all times that one of these three 
components is operating at its highest data rate, i.e., its bandwidth is 
saturated. We also assume a control unit which overlaps the operation of the 
processor, the primary memory and the disk. We first give some definitions 
which are analogous to those of Section 2. 



Let B, be the disk or I/O bandwidth. 



Then 




B 



d 



a, 



a, 



'md 



B. 



m 



and 



a 



dp 




a 



dm " B 




We also define 



3 



pd 




md 



m 



with 3 dp and £ dm being defined similarly. It follows that processor capacity 
may be written as: 
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a B B 
pm p _ m_ 

V" 1 B pm _1 



P ^ *pcf 1 



if a„ m < 3-1 
pm — pm , 



if a P d ^ V 1 5 



^ ct > $-1 and a . > 3„j-1 
pm — pm pa — pa 



Similarly, we have for memory capacity: 

r 



a B 

mp m 
e mp"^ 



B 



m 



^md" 1 



^mp" 1 



m 



and for disk capacity: 



if a > 3-1 
pm — pm 



if a md i "W 1 



if °pm i V 1 and a md i ^d" 1 ' 



°dn, B d . B 



^dm" 1 



/ a dp B d 

" ^ V 1 



B 



m 



V 1 



if a md 2- ^d" 1 



1f "dp i V 1 



if a md i ^d" 1 and a pd ^ Bpd" 1 • 



By summing these capacities for consistent conditions, we obtain 
saturated system capacities as follows: 
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r 



(i + 
(i + 
(i + 



3 mD" ] 


+ 


3 dD _1 
U P 


1 


+ 


1 


3 pm~^ 






1 


+ 


1 


3 pd _1 




Vf 1 



m 



if a > 3-1 and a. < B , -1 
pm — pm dp — dp 



if < 8„ m -l and a . > 3 mr j-l 
pm — pm md — md 



if a pd ^ V 1 and a md i ^md" 1 



It should be noted that in each of these three cases, if the conditions are 
written as equalities then the maximum capacity is obtained. In each case 
this reduces to 

max C = B + B + B , . 
s p m d 

To make matters concrete, in Figure 6 we sketch a surface for 
p 

Bp = B, B m = 2B, and B^ = j • Th e processor capacity is shown as a plateau 

of height B which runs off to 0 along the memory-disk axis. The top surface 
is the system capacity. In the region labelled I, the system is processor 
bound, and in II and III it is memory and disk bound, respectively. Where 



these three regions meet, the max S c = 3.5B point is located. 
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Figure 6. Capacity Space 
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5. Primary Memory Size vs. 

It is well known that there exists a trade-off between primary 
memory size and I/O bandwidth. Our purpose here is to sketch an analysis 
of this trade-off and to relate it to our previous discussion of capacity. 

Let the primary memory size be N words of w bits each, for a total 
of wN bits. The time required to fill this memory from a disk of bandwidth 
(assuming B^ < B ) is wN/B^ sec. 

For simplicity, assume a given computation operates on the entire 

memory. Assume the computation requires N a time steps. For example, given 

3 2 
an nxn matrix, an n step algorithm would give a = 3/2, since n = N if the 

matrix (or a single nxn partition) fills primary memory. Now the time 

required for the entire computation would be wN a /Cp sees. 

On the average, the system would be balanced if the processing time 
were equal to the input time (assuming no output), that is: 

wN°7c = wNB . 



or 



1 C n 

B d 



which gives us x i 

/ C \ ' 



N--JJ!"- 1 (9) 

as a relationship between memory size N, I/O bandwidth B^, and processor 
capacity C p . 

The above model can be easily refined in various ways to provide 
for input and output of data arrays, to provide for multiple buffering, and 
so on. 



23 



6. Conclusion 

The point of this report is to provide a framework for the study 
of computer capacity. We have explored several aspects of the question and 
Figure 6 shows a system capacity surface as a function of processor, memory 
and disk bandwidth. For a given class of computations, this surface corre- 
sponds to a memory size given by Equation 9 in Section 5. 

While we have glossed over many details, the model described here 
could be useful in the various ways mentioned in the Introduction. 

For example, if we were given a set of computations and a machine 
configuration we could easily determine a Figure 6 type surface from the 
machine parameters. From the computational algorithms, we could estimate 
the various 3 values as discussed in Section 2. This would allow a 
determination of our operating point in capacity space. While the ideal 
point is where C s = Cp + C m + , a prudent region is probably somewhere 

between that point and the processor corner- of Figure 6 for "numerical" 
problems. For "business "-type problems it may be between there and the 
memory corner of Figure 6. For the class of algorithms under consideration, 
Equation 9 could be used to make memory size trade-offs. 

Given some qualitative idea of the operating rules a user prefers, 
one could use this model to make quantitative sensitivity studies of capacity 
as a function of bandwidth and memory size. This could lead to improved 
system cost/effectiveness. 

Note that for any given capacity surface, degradation due to 
operating system overhead, etc., can be quantified by plotting actual 
performance data in capacity space. In this case, the surfaces shown will 
serve as theoretical upper bounds on system performance. 



